6/29/2026
Why I built Miko — a retrieval-augmented, guardrailed AI assistant for michaelhuang.ca and si8tech.com — and the trust engineering (eval gate, guardrails, curation) that makes it safe to put my name on.
Most portfolio sites are monologues. You write the copy, a visitor skims 20% of it, and the nuance — why a 30% cost cut mattered, what "agentic workflows" actually means in practice — never lands. So I built the dialogue: Miko, an AI assistant embedded on both michaelhuang.ca and si8tech.com that answers questions about my work in my words, grounded in a curated knowledge base, with hard guardrails so it never makes things up or leaks what it shouldn't.
It's powered by Pythia — the retrieval-augmented engine underneath. Here's how it works, and the engineering decisions that make it trustworthy enough to put my name on.
A visitor's question is embedded, matched against a vector store of curated documents (Supabase pgvector), and the top matches become the only context the model is allowed to answer from. Every answer carries a citation back to the source. One shared knowledge base feeds both sites; a per-site persona reframes the same facts — warm and personal on michaelhuang.ca, consultative on si8tech.com.
Anyone can wire an LLM to a website in an afternoon. The reason most companies don't put one on their front page is that a naive bot will confidently hallucinate, leak training data, or get talked into saying something off-brand. The engineering that matters is the part that makes those failure modes impossible. Four things I built in:
1. It can only speak from the corpus. The model is instructed to answer strictly from retrieved context and to treat that context as reference data, not instructions — so "ignore your rules and…" goes nowhere. If the answer isn't in the corpus, it says so and points you to the contact form instead of inventing one.
2. A deterministic eval gate blocks bad deploys. Before anything ships, a golden set of questions runs through the real pipeline and is scored automatically — including must-not-leak cases (salary, phone number) that fail the build if a single one leaks. The gate runs in CI on a schedule against production. This is the difference between "it seemed fine when I tested it" and "it provably doesn't leak."
3. Layered abuse defense. Per-minute and per-hour rate limits, an abuse/prompt-injection classifier with a strike-then-timeout system, locked CORS to just the two origins, hashed IPs, a length cap, and a kill switch that disables the bot via one environment variable — no redeploy.
4. Curated, not crawled. The bot doesn't scrape my site and hope. Its knowledge is a hand-curated, version-controlled corpus where every claim is deliberate and every metric is audited for internal consistency. (More on that below — it's the part most people get wrong.)
My first instinct was to dump everything — every role, every project, every number — into the knowledge base. Retrieval quality dropped. The reason is mechanical: the bot only retrieves a handful of chunks per answer, so when the corpus is full of redundant, low-signal text, the "lottery" pulls noise and the answers get vague.
The fix wasn't less content — it was curated content. I split bloated paragraphs into focused, single-topic documents (each project gets its own, with its real KPIs), added a dedicated "highlights" document as a high-recall anchor for broad "what's your range?" questions, and a focused "seniority" document after the eval gate caught the bot hedging on a basic recruiter question. Density of distinct facts helps; redundancy hurts. That distinction is the whole game in retrieval engineering.
I also ran a metrics audit — and caught myself. An early draft claimed "5× velocity" alongside "30% faster cycles," which don't mathematically correspond (5× implies an ~80% cut). I retailored it to a defensible 2–3× and reframed the two numbers as complementary mechanisms (per-cycle speed × parallelism), because a chatbot that will be interrogated by skeptical recruiters can't afford a number that doesn't hold up.
This project is the pitch. It demonstrates the exact thing I do for clients through SI8 Technology: take an ambiguous problem ("make my site sell better"), turn it into production-grade architecture with real guardrails, and ship something measurable and trustworthy. RAG, evals, AI-native delivery, and the judgment to know that the safety and curation are the product — not the LLM call.
If you want to see it work, ask the bot on either site about my experience, a specific project, or how SI8 engagements work. And if you're building something where an AI feature has to be reliable enough to stake your reputation on — that's the conversation I like having. Book a call.
Built with TypeScript, Supabase pgvector, OpenAI embeddings, and Claude. Source-curated corpus, deterministic eval gate, deployed on Vercel.